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Abstract. The idea of using metaplastic synapses to incorporate the separate storage 
of long- and short-term memories via an array of hidden states was put forward in the 
cascade model of Fusi et al. In this paper, we devise and investigate two models of a 
metaplastic synapse based on these general principles. The main difference between 
the two models lies in their available mechanisms of decay, when a contrarian event 
occurs after the build-up of a long-term memory. In one case, this leads to the 
conversion of the long-term memory to a short-term memory of the opposite kind, 
while in the other, a long-term memory of the opposite kind may be generated as a 
result. Appropriately enough, the response of both models to short-term events is not 
affected by this difference in architecture. On the contrary, the transient response of 

q \ both models, after long-term memories have been created by the passage of sustained 

signals, is rather different. The asymptotic behaviour of both models is, however, 
r»o , characterised by power- law forgetting with the same universal exponent. 
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1. Introduction 

Human memories are known to be fickle, but they are also capable of being elephantine. 
While research in this field is longstanding [f] in the field of psychology, it is only 
relatively recently that it has been attacked from an interdisciplinary perspective. The 
seminal work of Amit and collaborators [2, 3, 4] on neural networks was in large part 
responsible for opening up the field to physicists [5]; much the same can be said about 
the work of Hopfield [6]. The optimisation of learning on complex neuronal networks has 
been a field in itself; it has generally assumed that memories are stored via the abrupt 
change that occurs in the synapses connecting neurons, when they are exposed to a 
particular pattern. This picture is premised on the notion of binary synapses ('synaptic 
switches'), which are a natural approximation to synapses possessing a finite set of 
discrete states. There is some experimental evidence [7, 8] in their support, and they 
have also been extensively used in earlier mathematical models (see e.g. [9, 10, 11]). 

The above mechanism of synaptic plasticity has, however, been shown to be rather 
inefficient when synapses change permanently [12]. Pure plasticity indeed does not 
provide a mechanism for protecting some memories while leaving room for other, newer, 
memories to come in, hence the need for the mechanism of metaplasticity [3]. In order 
to improve performance, Fusi et al [13] proposed a cascade model of a synapse with 
many hidden states, which they claimed was able to store long-term memories more 
efficiently, with a decay that was power-law rather than exponential in time. Such 
power-law forgetting has in fact also been observed experimentally [14, 15] (albeit at 
a behavioural rather than a synaptic level). This issue forms the focus of the current 
paper, where we also put Fusi et a/'s cascade model on a more quantitative basis, by 
submitting it to detailed questioning in a way that has not been done in either the 
original work or in subsequent papers. Another aim of our work is to see whether the 
introduction of architectural differences might induce important differences in behaviour: 
we accordingly devise a model which has a different mechanism for the decay of long- 
term memories, compared to the one of Fusi et al, and compare the two models. 

The plan of this paper is as follows. In Section 2 we define both models to be 
investigated. Model I is an extension of the original cascade model by Fusi et al, whereas 
Model II has a different architecture. Both models however share the common feature 
that all the transition probabilities decay exponentially with the level depth of the 
hidden states. Section 3 presents the formalism of Markov chains used in this work. 
The default states of the two models are studied in Section 4. This allows us to identify 
some useful parameters, which include static and dynamical lengths £ s and £d relevant to 
the problem. Section 5 is devoted to the response of both models to a single long-term 
potentiating (LTP) input signal and to a DC signal (sustained LTP signal); here we also 
provide an investigation of universal asymptotic power-law forgetting (common to both 
models) and of the non-universal transient forgetting specific to Model II. In Section 6 
we study the signal-to-noise ratio which emerges from an investigation of fluctuations 
around the default state, while in Section 7 we illustrate the response of the models 
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to a selection of specific time-dependent input signals. While some of these signals 
may be seen to be biologically unrealistic, they are necessary for a systematic study of 
our models, viewed from a physicist's perspective as signal processing units. Finally, 
we discuss our results in Section 8. Appendix A contains a detailed investigation of 
the problem of the logarithmic walker, whereas Appendix B examines the transient 
behaviour of both models, and includes a derivation of the non-universal transient 
exponent of Model II. 

2. The models 

In this section, we define the models to be studied and introduce some of the ideas 
relevant to our investigations. Synapses can respond differently to an incoming action 
potential, in a way that could change with time [16]: if a particular stimulation paradigm 
leads to a persistent increase in response, this leads to the long-term potentiation of 
synapses (LTP), whereas long-term depression (LTD) corresponds to the opposite limit. 
This change in the strength of a synapse from a weak to a strong state and vice versa 
is referred to as synaptic plasticity and forms the basis of the current understanding of 
learning and memory, when applied to the many interconnected networks of synapses 
in the brain. If synapses are highly plastic, memories are quickly stored: however, high 
plasticity also means that more and more memories are stored, generating enough noise 
so that earlier memories are soon irretrievable. Clearly, this is at variance with the fact 
that long-term memories are quite ubiquitous in human experience; it was to resolve 
this paradox that Fusi et al [13] devised the cascade model which is the motivation for 
the present paper. 

The pathbreaking idea behind the work of Fusi et al was that the introduction 
of 'hidden states' for a synapse would enable the delinking of memory lifetimes from 
instantaneous signal response: while maintaining quick learning, it would also be able 
to allow slow forgetting. In the original cascade model of [13], this was implemented 
by the storage of memories at different 'levels': the relaxation times for the memories 
increased as a function of depth. It was assumed that short-term memories, stored at 
the uppermost levels, would decay as a consequence of their replacement by other short- 
term memories ('noise'). On the other hand, longer-lasting memories remained largely 
immune to such noise as they were stored at the deeper levels, which were accessible only 
rarely. This hierarchy of timescales models the phenomenon of metaplasticity [17, 18]. 

In this work, we make a detailed comparison of two different models of a metaplastic 
binary synapse with infinitely many hidden states (levels), labelled by their depth 
n — 0, 1, ... At every discrete time step t, the synapse is subjected either to an LTP 
signal (encoded as e(t) = +1) or to an LTD signal (encoded as e(t) = —1), where 
e(t) = ±1 is the instantaneous value of the input signal at time t. 

The first model (Model I), defined in Figure 1, is an extension of the original 
cascade model proposed by Fusi et al [13] . The application of an LTP signal can have 
three effects: 
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• If the synapse is in its — state at depth n, it may climb one level (n — >■ n — 1) with 
probability a n . (This move was absent in the original model.) 

• If it is in its — state at depth n, it may alternatively hop to the uppermost + state 
with probability (3 n . 

• If it is already in its + state at depth n, it may fall one level (n — Y n + 1) with 
probability j n . 
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e = +l (LTP) £ = -1 (LTD) 

Figure 1. Schematic representation of Model I. Arrows denote possible transitions 
in the presence of an LTP signal (e = +1, left panel) and of an LTD signal (e = —1, 
right panel). Corresponding transition probabilities are indicated. In each panel, the 
left (resp. right) column corresponds to the — (resp. +) state. The model studied in 
this work is actually infinitely deep. 



Long-term memories will be stored in the deepest levels of the synapse, because 
of the persistent application of characteristic signals. The effect of noise on such a 
long-term memory is, in the context of this model, to replace a long-term memory by 
a short-term memory of the opposite kind. If, for example, the signal is composed 
of entirely LTP events, an isolated LTD event could be seen to represent the effect of 
noise. In this case, the Fusi model predicts that the signal is thrown from a deep positive 
level of the synapse to the uppermost level of the negative pole. Seen differently, this 
mechanism converts a long-term memory of one kind to a short-term memory of the 
opposite kind. 

It is however plausible that long-term memories of one kind could be replaced by 
long-term memories of another kind (e.g. if a sudden event causes an abrupt change 
that is in its turn long-lasting). Our Model II, defined in Figure 2, implements this 
mechanism. The three outcomes of the application of an LTP signal are now as follows: 

• If the synapse is in its — state at depth n, it may climb one level (n — > n — 1) with 
probability a n . 

• If it is in its — state at depth n, it may alternatively cross over to the + state at the 
same level with probability (3 n . 

• If it is already in its + state at depth n, it may fall one level (n — > n + 1) with 
probability j n . 
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e = +l (LTP) e = -l (LTD) 

Figure 2. Schematic representation of Model II. Same conventions as in Figure 1. 

Along the lines of Fusi et al [13], the transition probabilities of both models are 
assumed to decay exponentially with level depth n: 

a n = ae~ ( - n - 1 ^, f3 n = /3e"^ d , 7n = 7 e"^. (2.1) 

The corresponding characteristic length, 

& = — , (2.2) 

is one of the key ingredients of the models, which measures the number of fast levels at 
the top of the synapse. It will be referred to as the dynamical length of the problem. The 
choice made in [13] corresponds to e _Md = |, i.e., /id = In 2. A different characteristic 
length, the static length £ s , giving the number of occupied levels in the default state of 
the synapse, will be introduced in Section 4. 

3. Formalism 

We will make a detailed comparative analysis of Model I and Model II, with a view to 
establishing similarities and differences associated with their respective architectures. 
In both cases the synapse is considered to be infinitely deep, with levels numbered by 
n = 0,1,... We use the language of stochastic processes [19], and in particular the 
formalism of inhomogeneous Markov chains. :f 

The basic quantities are the probabilities P n (t) (resp. Q n (t)) for the synapse to be 
in the — state (resp. in the + state) at level n = 0, 1, . . . at time t = 0, 1, . . . These 
probabilities can be combined in order to form quantities of interest: 
• Probability for the synapse to be in the — state (resp. in the + state) at time t, 
irrespective of level: 

P(t) = J2 Pn(t), Q(t) = E Qnif) = 1 " Pif). (3.1) 

n>0 n>0 

| This formalism is a discrete-time analogue of that used extensively in the mathematical literature, 
to study e.g. birth and death processes or queuing processes [20, 21]. 
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• Probability of being at level n at time t, irrespective of state: 

S n (t) = P n (t) + Q n (t). (3.2) 

• Mean level depth 

(n(t)) = J2nS n (t). (3.3) 

ra>0 

• Level-resolved polarisation (output signal) of level n and total polarisation of the 
synapse at time t: 

D n (t) = Q n (t) - P n (t), D(t) = ]T D n (t) = Q(t) - P(t). (3.4) 

ra>0 

We have the inequalities 

\D n (t)\<S n (t), \D(t)\<l. (3.5) 

The probabilities P n (t) and Q n (t) obey the following dynamical equations, whose 
form is characteristic of Markov chains: 

• Model I, e(t + 1) = +1 (see Figure 1, left): 

P n (t + 1) = (1 - a n - p n )P n (t) + a n+1 P n+1 (t), 

Q n {t + 1) = (1 - ln )Q n {t) + 7n-lQn-lW + S n0 P(t), 

• Model I, e(t + 1) = — 1 (see Figure 1, right): 

P n (t + 1) = (1 - ln )P n (t) + 7n _iP n _i(t) + 8 n0 Q(t), 
Q n (t + 1) = (1 - a n - (3 n )Q n (t) + a n+ iQ n +i(t), 

with 



(3.6) 



(3.7) 



PH) = E PnPn{t), Q(t) = J2 PnQn(t). (3i 

n>0 n>0 



• Model II, e(t + 1) = +1 (see Figure 2, left): 

P n (t + 1) = (1 - a n - p n )P n (t) + a n+1 P n+1 {t), 

Q n (t + 1) = (1 - ln)Qn{t) + ln-lQn-l{t) + P n P n (t). 

• Model II, e(t + 1) — — 1 (see Figure 2, right): 

P n (t + 1) = (1 - -y n )P n (t) + 7 „_iP n _i(i) + p n Q n (t), 

Q n {t + 1) = (1 - a n - P n )Qn{t) + On+lQn+l(*)- 

4. Default state and parameter space 



(3.9) 



(3.10) 



We here investigate the default state of the synapse, which is the average stationary state 
in the presence of a white-noise input signal. White-noise input is defined by choosing 
at each time step 

. . J +1 with probability \, . . 

| —1 with probability |. 

In the presence of a random input s(t), the probabilities P n (t) and Q n (t) are 
themselves random. We first evaluate the average response of the synapse, encoded 
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in the mean values of P n (t) and Q n (t) with respect to the random input signal. For 
simplicity, we continue to use the notation P n (t) and Q n (t) for the average probabilities, 
and S n (t) and D n (t) for their sums and differences. As the values e(t) of the input signal 
are independent of each other, the equations obeyed by the mean probabilities are the 
arithmetical means of (3.6) and (3.7) for Model I, and of (3.9) and (3.10) for Model II. 
The quantities S n (t) and D n (t) characterising the average response therefore obey: 
• Model I: 

S n (t + 1) = S n (t) + \{in-\S n -i{t) + OC n+ iS n+1 (t)) 

- \{0t- n + fin + ln)S n {t) + \8 nQ S(t), 

D n (t + 1) = D n (t) + i( 7n _i£> n _i(t) + a n+1 D n+1 {t)) 

- \{a n + (3 n + 7 n )D n (t) - \8 nQ D(t), 



(4.2) 



with 



Sit) = E PnS n (t), D(t) = J^ PnD n (t). (4.3) 

n>0 n>0 



Model II: 



(4.4) 



S n (t + 1) = S n (t) + \(rin-iS n -i(t) + a n+l S n+1 {t)) 

D n {t + 1) = D n (t) + |(7n-i^n-i(t) + a n+1 D n+1 {t)) 

-\{a n + 2p n + ln )D n {t). 

The default state is characterised by the time-independent solution to (4.2) or (4.4). 
The latter is of the form 

S% = (1 - e-^)e" n ^, D* = 0, (4.5) 

i.e., 

^f = g? = i(l-e-'-)e- B ^. (4.6) 

The default state is appropriately featureless. It is unpolarised, as it should be for a 
symmetric synapse. Furthermore, the occupation probabilities obey a simple exponen- 
tial falloff as a function of level depth. The corresponding characteristic length, 

& = -, (4.7) 

is referred to as the static length of the problem, and gives a measure of the effective 
number of occupied levels in the default state. The regime of most interest is where £ s is 
moderately large, so that the default state extends over several levels. The mean level 
depth 

<n> st = ^ = £s-i + --- (4.8) 

is then essentially given by the static length. 

The key role played by two characteristic lengths, static (£ B ) and dynamic (£d), 
is a striking similarity between this model and that of a column of interacting grains 
investigated previously [22]. 
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In contrast to the dynamical length C,^, which is a free parameter, the static length £ s 
is related to the values of the parameters a, ft, and 7 in a model-dependent way. Thus: 

• Model I: 

7 = ae""- + — -J?- -. (4.9) 

• Model II: 

7 = ae'^. (4.10) 

The above equations reveal the main difference between the two models at the level 
of the default state. The stationarity of the latter state involves balancing out 'upward' 
and 'downward' moves arbitrarily deep within the system. This goal is achieved in 
different ways in both models, consistent with their structural differences. 

In Model II, a large static length £ s is reached, irrespective of ft, when a and 7 are 
nearly equal, with a small bias in the upward direction: 

a -7 = (e Ms - 1)7 w j. (4.11) 

Ss 

The situation is very different for Model I, where non-local reinjection plays a key role. 
The stationary profile of the response may become critical (i.e., £ s — > 00) when a strong 
local downward bias is compensated by strongly non-local upward moves: 

7 - « = isnn « #* ( 4 - 12 ) 

This phenomenon is already at work in the original model by Fusi et al, where a — 0. 

We now discuss the parameter space of both models. The essential parameters are 
the static and dynamical lengths £ s and £<j> whose typical values are a few units. For 
fixed £ s and £d, oc, ft, and 7 are related by (4.9) or (4.10). We choose to take ft and 7 
as our independent parameters. Besides the condition that each of them is between 
and 1, they also fulfil (i) a > and (ii) a x + ft ± < 1 (see (3.6) or (3.9) for P^t+1)). For 
each model, the admissible values of ft and 7 belong to a quadrangular domain EFGH, 
shown in Figure 3 for £ s = £<} = 5. In both cases, saturating condition (ii) yields the EH 
line. The non-trivial coordinates of the vertices as well as some special features, in the 
case of each model, are given below. 

• Model I: 

7E = e -^, /3 G = e Ms+Md - 1, /3 H = e /id (e^-l)(e^ +Md -l). (4.13) 

The maximal value of ft for a fixed 7 lies on the FG line. This is the defining line for 
the original model of Fusi et al, corresponding to the choice a = 0: 

/W7) = (e Ms+Md - 1)7- (4.14) 

• Model II: 

7E = e~ Ms , 7 h =e~ Ms (l-e~ Md ). (4.15) 
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Figure 3. Domains of admissible values of j3 and 7 for both models with £ s = £d = 5. 



The maximal value of /3 for a fixed 7 lies on the (broken) EHG line: 

/W7) = min(e M (l - ef-y), 1). (4.16) 

The above expressions (4.14) and (4.16) for /3 max (7) cross at the following critical 
value of 7: 

^ C ~~ 2e^ s+/id — 1 ' ' 

so that Model I has a smaller (resp. larger) /3 max (7) for 7 < 7 C (resp. 7 > 7 C ). This is a 
result to bear in mind, as it turns out that the behaviour of many quantities of interest 
is largely determined by /3 max (7) (see e.g. Figures 11 and 12). 

Throughout the following, in numerical illustrations we use the parameter values 

& = £ d = 5, (i.e., fi s = f i d = 0.2), 7 = 0.5, (4.18) 

unless otherwise stated. For £ s = £d = 5, we have 7 C ~ 0.615735, so that the chosen 
value of 7 is smaller than j c . We have /3 max ps 0.245912 for Model I and /3 max rj 0.475490 
for Model II. 



5. 



Response to LTP input signals: power-law forgetting 



5.1. Single LTP signal 

When a single LTP input signal is applied at time t = 1 to the synapse in its default 
state, it will get polarised in response, and thus 'learn' the signal. Later on, under 
the influence of a white- noise random input signal for times t > 2, it will forget the 
LTP signal, and return to its default state. We will show that the process of forgetting 
is robust with respect to the architectural differences between the two models, and is 
characterised by a universal power law. 

The polarised probability profile of the synapse at time t — 1 is obtained by acting 
once with equation (3.6) or (3.9) onto the default state (4.6). We thus obtain 
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• Model I: 

P (l) = |(l-e-^)(l + ae^-/3), 



P n (l) = |(l-e-^)e~ nMs 



+ |(1 - e -»s) e -n^s+na)( ae -^ _ ae ^ d - f3) (n> 1), 
Qo(l) = |(1 - e-"-)(l + /3/(l - e-^-w) - 7 ), 
Q„(l) = i(l-e-^)e-^ 

+ |(1 - e -^)e-™ ( ^ +w) (e^ +w - 1)7 (n > 1). 



(5.1) 



Model II: 



(5.2) 



P n (l) = 1(1 - e -^) e - n ^ 

+ |(1 - e -M S ) e -«(Ms+f«d)( ae -f. s _ ae Md _ ft) ( n > 1 ) ) 

g (l) = |(l-e^)(l + /3- 7 ), 
Q n (l) = i(l - e-^)e- n ^ 

+ |(1 - e-^)e-™ ( ^ + ^ d) (e^ + ^ d - 1)7 (n > 1). 

The instantaneous output signal, i.e., the total polarisation D(l) of the synapse 
just after the LTP signal, takes the same value proportional to (3 for both models: 

D(l)=X 1 fi, Ai= /"_ e '■* (5.3) 

For ^ s = £ d = 5 we have Ai w 0.549833. 

The synapse then evolves under the influence of a white-noise random input during 
the subsequent forgetting phase. This evolution is described for Model I by the action 
of the recursion (4.2) on the probabilities (5.1), and for Model II by the action of (4.4) 
on (5.2). Figure 4 shows plots of the reduced polarisation signals D(t)/D(l) against 
time t for Model I (left) and Model II (right), for several values of (5. For small enough (3, 
the polarisation overshoots, i.e., it keeps increasing beyond D(l) in a transient regime 
at the beginning of the forgetting phase. The duration of this transient overshoot gets 
larger for smaller (3, and formally diverges in the /3 — > limit. 

This paradoxical behaviour can be explained as follows. In the forgetting phase, 
the total polarisation obeys the balance equation 

D(t + 1) - D(t) = -J2 PnD n (t). (5.4) 

n>0 

Generically, then, D(t) decays to zero, as expected. It may however grow in a transient 
regime, leading to the overshoot mentioned above, provided the initial polarisation 
profile is inhomogeneous enough so as to satisfy both 

D(t) = J2 D n(t)>0 and "£PnD n (t)<0. (5.5) 

n>0 n>0 

For a single LTP signal, the initial profile at time t — 1 is such that -Do(l) < 0, 
whereas D n {l) > for n > 1, for both models and with /3 small (see (5.1), (5.2)). 
Since the rates (3 n fall off exponentially in n, the initial profile is thus likely to obey 
the inequalities (5.5), thus leading to the overshoot. In fact, it can be shown that the 
overshoot always occurs for j3 < /3 vcr(7), where: 
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Figure 4. Plot of the reduced total polarisation D(t)/D(l) after a single LTP signal, 
against time t, for both models and several /3 (see legends). 



Model I: 



Model II: 



Pover \l ) 



/3over(7) 



-/«,:! 



)(1 



-// s — AM" 



X — g— Ms— 2/^d 



- 7 . 



(5.6) 



0.066226 for Model I and f3 



(5.7) 
0.090634 



;i-e-^) 7 . 

For the parameters (4.18) we have /3 ove 
for Model II. 

To summarise, the instantaneous response D(l) to an LTP signal is proportional 
to (3, and therefore larger for larger (3; its subsequent decay is, however, fast for large j3 
- an undesirable feature - whereas it is slow and even non-monotonic for smaller (3. 
This suggests the absence of a natural criterion for defining an optimal /3, where quick 
learning and slow forgetting might simultaneously occur at the synapse. 



5.2. Universal power-law forgetting 

The asymptotic fall-off of the total polarisation of the synapse in response to a single 
LTP signal is illustrated in Figure 5, showing a log- log plot of D[t) for much longer 
times (up to t = 10 5 ). The data for both our models show a common power-law decay: 
thus, for our choice of parameter values, D(t) ~ 1/t 2 in both cases. § This is known as 
power-law forgetting, which will be analysed below. 

The expressions (5.1), (5.2) show that the initial polarisation profile decays 
exponentially as a function of level depth n, as 



Corrections to the asymptotic power law are, however, stronger for Model I. 



(5.8) 
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Figure 5. Log-log plot of the total polarisation D(t) after a single LTP signal, against 
time t, for both models and several /3 (see legends). The absolute slope of the dashed 
lines is the theoretical value (5.11), i.e., 9 = 2. 



ill 



This exponential decay is governed by the product of the probabilities S^ ~ e 
the default state (see (4.6)) and the polarising rate (3 n ~ e~ nfJld (see (2.1)). 

Now consider the synapse at a late stage of the forgetting phase (t 3> 1). The white- 
noise input essentially erases the polarisation profile down to a level depth n* such that 
Pn f t ~ 1. More details on this derivation can be found in Appendix A. This gives: 

(5.9) 



n. 



£ d lni 



Of course, the only part of the polarisation that survives at large times t is the part 
which has not yet been forgotten: this lives in the deeper levels (n > n*), where white 
noise has not yet erased the remnants of the memory. The total polarisation is therefore 
expected to scale as _D n „ (1). Using the estimates (5.8) and (5.9), we obtain an asymptotic 
power-law decay of the polarisation signal: 



D(t) ~ f 



(5.10) 



with 



9 = 1 



1 + 



fc 



(5.11) 



/Us 

= L ^~ < 

Hd £ s 

The forgetting exponent 9 thus obtained only depends on the ratio of the static and 
dynamical lengths £ s and £<j- Its expression (5.11) is universal, in the sense that it holds 
irrespective of the model architecture, and of the rates a, (3, and 7, besides the fact 
that £ s is related to the latter parameters in a mo del- dependent way (see (4.9), (4.10)). 
We would thus expect power-law forgetting with exponent 9 to be manifested for a large 
class of learnt signals. 

It is worth remarking here that, if the synapse were finite rather than infinite, and 
consist of N levels, the power- law decay (5.10) would be exponentially cutoff at a time r 
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such that (3 n t ~ 1. The cutoff timescale thus obtained, 



(5.12) 



is exponentially large in the ratio of the number N of levels to the dynamical length £,j- 



5.3. DC signal (sustained LTP signal) 

We now turn to the investigation of a DC input signal, i.e., a sustained LTP input signal 
lasting for T time steps (e(t) — +1 for 1 < t < T). The synapse is again assumed to be 
initially in its default state. 

The learning and forgetting processes will be qualitatively similar to the above, 
while novel qualitative features emerge deep in the DC regime, i.e., when the duration 
of the LTP signal is long enough so that the product /3T is large. In this regime, the 
synapse gets almost totally polarised under the persistent action of the input signal. This 
saturation phenomenon is illustrated in Figure 6, which shows the total polarisation D(t) 
of both models for several durations T of the DC signal. 



Model I 



Model II 





Figure 6. Plot of the total polarisation D(t) of both models with j3 = 0.2, against 
time t, for several durations T of the DC signal (see legends). 



The synapse slowly builds up a long-term memory in the presence of a long DC 
signal, as the polarisation profile moves to deeper and deeper levels. This feature is 
illustrated in Figure 7, which shows a plot of the full polarisation profile of Model II 
at the end of the learning phase, for several durations T of the DC signal. When the 
synapse becomes fully polarised in the late-time regime (fit 3> 1), the level polarisations 
become approximately D n (t) = Q n (t); for both models, the signals travel down the 
synapse with exponentially decaying rates. Thus, both (3.6) and (3.9) become: 

D n (t + 1) = (1 - j n )D n (t) + 7 n-iA,-i(i). (5.13) 

The polarisation dynamics are therefore modelled by that of the logarithmic walker 
(Appendix A). Thus, at the end of the learning phase (t = T), the polarisation profile 
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will have the form of a sharply peaked traveling wave (see (A.5)), around a mean depth 
which grows according to the logarithmic law (see (A. 2)) 

(n)tt£ d lnjT. (5.14) 



0.25 




Figure 7. Plot of the polarisation profile D n (T) of Model II with j3 = 0.2 at the end 
of the learning phase, against level depth n, for several durations T of the DC signal 
(see legend). 



We now turn to the decay of the total polarisation D(t) generated by a sustained 
LTP signal, deep in the DC regime. Figure 8 shows a log-log plot of D(t) for several 
durations T of the DC signal, and much longer observation times. The polarisation 
decays via the universal power law (5.10), irrespective of the length of the learning 
phase, driving home the universality of power-law forgetting. 



Model I 




Model II 




Figure 8. Log- log plot of the total polarisation D(t) against time t, for both models 
with /3 = 0.2 and several durations T of the DC signal (see legends). The absolute 
slope of the dashed lines is 9 = 2. 
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5-4- Non-universal transient power-law forgetting 

So far, we have shown that most features of our two models of the synapse are pretty 
robust to their different architectures: however, in the following, we show an important 
phenomenon where the two models differ strongly, for a sustained or persistent signal 
and at larger timescales. 

Figure 9 shows a log-log plot of D(t) against the time ratio t/T, for both models in 
the regime where the duration T of the DC signal and the observation time t are both 
large and comparable. In the case of Model I, the data for the longer times exhibit a 
clear collapse, indicating a scaling behaviour of the form 

D(t)*F(t/T), (5.15) 

which is a signature of (simple) aging [23]. The corresponding scaling function falls off 
as F(x) ~ x~ e . For Model II, the decay of the total polarisation is more subtle, and 
exhibits two successive regimes: (i) a transient regime, where D(t) exhibits simple aging 
in terms of t/T, and falls off rather rapidly; (ii) an asymptotic regime, where D(t) falls 
off with the universal exponent 9, but does not obey simple aging. 
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Figure 9. Log- log plot of the total polarisation D(t) against the time ratio t/T, for 
several durations T of the DC signal (see legends). The absolute slope of the black 
dashed lines is 9 = 2, while that of the red one for Model II is ~ 5.056. 



This qualitative difference between both models is investigated in detail in 
Appendix B, but we give a simple flavour here: suppose the synapse is in a polarised 
state where only the uppermost level is occupied, when the process of forgetting 
begins. For Model I, the polarisation always falls off with the universal forgetting 
exponent 9, whereas for Model II it falls off more rapidly, with a larger transient 
forgetting exponent which depends continuously on (3 (see (B.13)). For (3 = 0.2 
and 7 = 0.5 we have ~ 5.056516. We are thus led to the following scenario for 
Model II: (i) the bulk of the polarisation profile is sharply localised around the typical 
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depth (5.14) (see Figure 7), and therefore falls off with the transient exponent 0; (ii) 
the tail of the polarisation profile, which still has the universal exponential form (5.8) 
(as long as T is finite), is responsible for the subsequent universal asymptotic decay. 

6. Fluctuations in default state and signal-to-noise ratio 

The average response of the synapse to a white-noise random input signal defines its 
default state, investigated in Section 4. However, there are appreciable dynamical 
fluctuations around this average, which are seen on plots (see Figure 10) of the mean level 
depth (n(t)) (left) and of the total polarisation D(t) (right) for Model I.|| The average 
quantities in each case are shown as red horizontal lines, for ease of comparison. The 
mean polarisation vanishes, whereas the mean level depth is (n) st « 4.516655 (see (4.8)). 



A 



v 





2000 



Figure 10. Response of the synapse to a single instance of white-noise random input 
signal (Model I, /3 = 0.2). Left: mean level depth (n(t)). Right: total polarisation D(i). 
Red lines: average quantities, characteristic of the default state. 



The large fluctuations observed in both quantities are due to the occurrence of 
long ordered subsequences (patches) of LTP or of LTD events in the input signal 
(e(to + 1) = ■ ■ ■ = e(to + T)). Patches of duration T occur with exponentially small 
probabilities 2~( T ~ 1 \ so that for a total observation time t, the largest ordered patch has 
T « (In2£)/(hi2). For a time of observation t = 2000, for example, we can have patches 
of temporal length as large as T rj 12. The main effect of, say a long patch of LTP/LTD 
events, is that the synapse gets more and more positively/negatively polarised, with 
the signal penetrating to ever deeper levels along the appropriate branch. Such large 
fluctuations in D(t) are therefore distributed symmetrically around zero, whereas those 
in (n(t)) are toward deeper levels. Clearly, the fluctuations in both quantities should be 
strongly correlated and the plots show that they are. 

We define the (amplitude) signal-to-noise ratio R of our models as the ratio of 



Similar qualitative behaviour would be obtained for Model II. 
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the instantaneous single LTP signal response D{\) = \\ft (see (5.3)) to the standard 
deviation D rms = (D 2 ) 1 ^ 2 of the spontaneous fluctuations around the default state: 

R -^-ww (eu) 

Figure 11 shows a plot of the signal-to-noise ratio thus defined, against ft, for both 
models. The mean squared polarisation (D 2 ) is measured by numerically evaluating the 
response of our models to a very long sequence of white noise. Both datasets essentially 
exhibit the same monotonic dependence on ft. They seem to obey the scaling behaviour 
R ~ \fft at small ft, suggested by the forthcoming analysis of the limiting regime £ s — > 
(see (6.4)). Conversely, R is maximal at /3 max (7)> an d this maximal value is essentially 
determined by the range ft max (-y) of allowed values of ft (see (4.14), (4.16)). 



P<0.4 




Figure 11. Plot of the signal-to-noise ratio R of both models against j3 < /? m ax(7) 
(see (4.14), (4.16)). 



For Model I with £ s = £d = 5, the signal-to-noise ratio R reaches its global maximum 
over ft and 7, i.e., -R max ~ 0.645, at 7 = 1, i.e., at point G (see Figure 3). For Model II, 
the global maximum -R max ~ 0.951 is reached in the 7 — > limit, i.e., again at point G. 

Optimising the signal-to-noise ratio even further necessitates allowing the lengths £ s 
and £d to vary; R is observed to reach its absolute maximum R = 1 in the £ s — > limit. 
In this regime, the polarisation reads D{t) = D (t) = Qo(t) — Po(t), and it is governed 
by the following simple dynamical equation 

D{t + l) = (l-p)D{t)+0e{t + l) (6.2) 

for both models. In the stationary state for a white-noise input, we thus have 



(D 2 ) 



/3 2 E(W) 

fc>0 



2/,- 



ft 



2- ft' 



and finally 



R 



y/W-P)- 



(6.3) 



(6.4) 
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The signal-to-noise ratio thus obeys a quarter-of-a-circle law as a function of f3, and 
attains its absolute maximum R — 1 at /3 — 1, irrespective of anything else. 

This extreme £ s — > regime is however of little interest, as all the action takes place 
in the uppermost level (n — 0), so that metaplasticity is lost. 

7. Response to a variety of input signals 

In order to examine the storage of memories in the general case, we now examine the 
response of our two synapse models to a variety of types of time-dependent input signals. 
As already mentioned in the Introduction, this section completes the systematic study 
of our models viewed from a physicist's perspective as signal processing units. 

7.1. AC signal 

An AC signal is a perfect alternation of LTP and LTD events, represented by the input 

e(t) = (-If. (7.1) 

After a short transient, the synapse reaches a stationary state, where the occupation 
probabilities keep oscillating in phase with the input signal, according to 
t even (e(t) = +1) : P n (t) = A n , Q n {t) = B n 



'n ■ 



t odd (e(t) = -1) : P n (t) = B n , Q n (t) = A n . (? ' 2) 

The staggered probabilities A n and B n are given by the normalised solution of the 
following equations: 
• Model I: 

A n = (1 — «n — Pn)B n + a n+ i-B n+1 , 

B n = (1 - 7„)A n + 7„_iA n _i + S n0 B, 



(7.3) 



with 



B = Y, PnB n . (7.4) 



n>0 



Model II: 



(7.5) 



A n = {I — a n — P n )B n + a n+1 B n+1 , 
(1 - (3 n )B n = (1 - 7 n )A n + 7„_iAi-i- 

The staggered polarisation of the stationary state reads 

D* = Km (e(t)D(t)) = ^(S n - A n ). (7.6) 

n>0 

This quantity starts increasing linearly with (3, as 

D* « A A c/3, (7.7) 

irrespective of the model, provided parameters are the same. In the /3 — ¥ limit, (7.3) 
and (7.5) indeed simplify to the same equations 

4°) = (1 - COB® + On+l^gi, ,- R , 

B(0) = (1- 7b )^+ 7 ^i, ( } 
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whose normalised solution A%\ B^ is therefore model-independent. We have then 

\AC = J2 e ' n " dB n ) - (7-9) 



n>0 



For the parameters (4.18) this gives Aac ~ 0.329712. 

Figure 12 shows a plot of D* against /3 for both models. For each model, /3 is 
limited by /3 max (7)- Once again we see that the staggered polarisation reaches larger 
values for Model II, mainly as a consequence of the larger range of allowed values of (5. 




Figure 12. Plot of the stationary staggered polarisation D* of both models submitted 
to an AC signal, against /3 < /3 ma x(7) (see (4.14), (4.16)). 



(7.10) 



7.2. Coloured random signal 

We next consider a coloured random input signal defined by the following rule: 

J e(t) with probability r, 
1 — e(t) with probability 1 — r, 

with e(l) = +1 for definiteness. 

The persistence probability r allows this coloured random signal to interpolate 
between several situations described above: 

• the DC signal investigated in Section 5.3 is recovered for r = 1, 

• the AC signal investigated in Section 7.1 is recovered for r = 0, 

• the white- noise signal investigated in Sections 4 and 6 is recovered for r = |. 

The correlation function of the signal (7.10) is 

S(t) = (e(t )e(t + t)) = (2r - 1)' (t > 0). (7.11) 

The coloured signal is therefore positively correlated, or persistent, for | < r < 1. The 
corresponding characteristic time 

1 



ln(2r - 1) 



(7.12) 
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diverges near the DC limit (r — y 1) as r w 1/(2(1 — r ))- The signal is anti-persistent, 
with oscillating correlations, for < r < |. 

The synapse submitted to a coloured random input signal reaches a fluctuating 
stationary state after a relatively short transient. For a given realisation, it exhibits 
strong dynamical fluctuations which are qualitatively similar to those shown in 
Figure 10. Figure 13 shows plots of the total polarisation D(t) for two typical realisations 
of coloured random signal, in an anti-persistent case (r = 0.2, left) and in a persistent 
case (r = 0.8, right). Both the amplitude and the correlation time of the fluctuations 
are observed to increase with r, as might be expected. 
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Figure 13. Plot of the total polarisation D(t) in response to a single realisation of 
coloured random input (Model I, = 0.2). Left: an anti-persistent case (r = 0.2). 
Right: a persistent case (r = 0.8). 



Figure 14 shows a plot of (numerically measured) stationary values of the mean 
depth (n) (left) and of the mean squared polarisation (D 2 ) (right), for both models 
with ft = 0.1 and 0.2 and a varying persistence probability r. The mean depth starts 
from its lowest value in the r — >• limit, i.e., for the AC signal. It increases smoothly 
as a function of r, and diverges logarithmically as (n) ~ £d l nr ~ £d| m (l — r)\ near the 
DC limit (r — > l).^f All the curves cross at the white- noise point (r = |), where the 
result (4.8) holds irrespective of the model and of its parameters. The dependence of 
the mean depth on the persistence probability r is far more pronounced for Model II 
than for Model I. The behaviour of the mean squared polarisation (D 2 ) provides another 
appreciable difference between the two models. In both cases it starts increasing as a 
function of r, from a very small value in the r — > limit. Its behaviour as r — > 1 is 
however very different in both models. The mean squared polarisation keeps steadily 
increasing in the case of Model I, whereas its increase is much less pronounced for 
Model II, even becoming non-monotonic at high enough ft. 



^f This logarithmic law can be derived in the same spirit as (A. 2) and (5.14). 
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Figure 14. Plot of the stationary values of the mean depth (n) (left) and of the 
mean squared polarisation (D 2 ) (right), against the persistence probability r, for both 
models with /3 = 0.1 and 0.2. 



This qualitative difference between the responses of both models to highly persistent 
random signals can be related to the difference in their transient responses, investigated 
in Section 5.4. In Model II, the low-frequency components of the memory lie slightly 
deeper within the synapse. More importantly, they relax much faster than in Model I, 
as their falloff can be characterised by a larger, non-universal exponent 0. 



7.3. Oscillatory signal 

The last case we consider is that of an oscillatory input signal, which consists of 
alternating long blocks of LTP and LTD signals of length T time steps, i.e., 

£ (t) = (_i) Int (*/r) ; ( 7 13 ) 

where Int denotes the integer part. 

After a relatively short transient regime, the synapse converges toward a periodic 
state, where the polarisation and other quantities oscillate with the period 2T of the 
input signal. Figure 15 shows a plot of the values of the mean depth (n) (left) and of 
the mean squared polarisation (D 2 ) (right), averaged over one period in the stationary 
state of the synapse, for both models with /3 = 0.2, against the half-period T of the 
oscillatory signal. 

These data corroborate the observations made in Section 7.2. The dependence of 
the mean depth on T is again steeper for Model II than for Model I. The data for both 
models are however compatible with the common logarithmic asymptotic growth law 
(n) ~ £d InT. The mean squared polarisation (D 2 ) is observed to increase monotonically 
as a function of T for Model I, whereas for Model II it reaches a maximum and then 
smoothly decreases. 
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Figure 15. Plot of the stationary values of the mean depth (n) (left) and of the 
mean squared polarisation (D 2 ) (right) for both models with j3 — 0.2, against the 
half-period T of the oscillatory signal. 



8. Discussion 

In this paper, we have provided the first thorough analysis of a single synapse including 
the effects of metaplasticity We used two models: Model I is an extension of the original 
cascade model proposed by Fusi et al [13], whereas Model II, of our own invention, has 
a different architecture. 

Our intention was, apart from the thorough quantification of earlier ideas [12], 
the isolation of the mechanisms responsible for the storage of memories, and the 
differentiation of short- and long-term memories in response to a range of signal types. 
In the structure of the models we analysed, long-term memories were stored at greater 
'depths', and therefore relatively immunised to the constant bombardment of white noise 
in the upper levels, which forms our everyday experience. The difference between the 
two models lies chiefly in the mechanism of response of the synapse to a flip in sign of 
the input signal. In Model I, such flips tend to cause the memory trace to be dislodged 
to become a short-term memory of the opposite kind, while in Model II, changes in 
long-term memories are allowed to be more persistent, remaining at low level depths. 

Most remarkably, the same asymptotic power-law forgetting (with universal 
exponent 9) is manifested in both models, independent of their architecture. However, 
the aging behaviour of the models is rather different: Model I manifests simple aging, 
whereas Model II may have a long transient regime, where the polarisation falls off more 
rapidly (with a larger, non- universal and /3-dependent transient forgetting exponent 0), 
before the asymptotic power-law forgetting takes over. 

The behaviour of both models has been further illustrated by their responses to a 
range of input signals. The two observables we focused on were the mean depth of a 
particular memory trace, and its polarisation. Our observations suggest that Model II 
allows in general for a slightly greater penetration of signals. The long-term memories 
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thus created are however rather weaker than in the case of Model I. This weakening 
is to be put in perspective with the existence of the non- universal transient exponent, 
which is studied at length in Appendix B. Qualitatively speaking, it appears that the 
changing of 'opinions' represented by the two poles of a synapse at a relatively deep level 
(which is possible in Model II) has the effect of weakening the strength of a memory 
trace, far more than when contradictions are resolved by simply disposing of them in 
the short-term memories of the opposite pole. 

Our results provide the first prediction of the exponent of power-law forgetting at 
the level of a single synapse: the intensive analysis of these relatively simple models 
could help to start theoretical work on more complex architectures, since of course real- 
life forgetting relies not just on individual synapses, but on their connections to each 
other and to neurons. Possible extensions of this work might involve the coupling [24] 
of multiple synapses of the type presented above, or include the effect of correlated 
signals [25] ; increasing correlations in these ways might enhance the competition between 
the bulk and the tails of the signals deep within a metaplastic synapse. 
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Appendix A. The logarithmic walker 

The problem of the logarithmic walker is defined as follows. A particle lives on the 
semi-infinite chain, whose sites are labelled by the positive integers (n = 0, 1, . . .). At 
each time step, if sitting at site n, the particle may hop to the right (n — > n + 1) with 
exponentially decaying probabilities e _nM . This model can be alternatively thought of as 
describing a discrete-time pure birth process, which has already been considered in [26]. 
If we assume for definiteness that the particle starts from the origin at time t — 0, 
the probability p n (t) for the particle to be at site n at time t obeys the recursion 

p n (t + 1) = (1 - e-^K(t) + e-^V^i), (A.l) 

with initial condition p n (0) = S n o- 

Figure Al shows a plot of the probability profile p n (t) against n, for /i = 0.2 and 
several times t. The profile is observed to form a peak around a well-defined mean 
position (n(t)). As time goes on, the profile keeps its shape, while the mean position 
exhibits a very slow growth. 

A first heuristic approach to estimate this growth law consists in writing down the 
dimensional estimate e~ nM t ~ 1, hence the result 

(n(t)) « ^, (A.2) 
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Figure Al. Plot of the probability profile p n (t) of the logarithmic walker against 
position n, for /i = 0.2 and several times t (see legend). 



and the name, logarithmic walker. 

A more precise approach consists of writing down the following dynamical equation 
for the mean position of the walker at time t: 

(n(* + l))-(n(*)) = 5>-^ n (i). (A.3) 

n>0 

For a localised probability profile, we thus obtain approximately d(n)/dt ~ e~^ n ^, which 
yields 

ln(l + fit) 



(n(t)) 



/' 



(A.4) 



in agreement with (A. 2). 

Turning to more quantitative analysis, we look for an asymptotic solution to (A.l) 

in the form of a traveling wave moving on a logarithmic time scale, i.e., 

hit 
p n (t)^F(x), x = n-\, \ = — . (A. 5) 

H 

It is worth emphasising the difference between the present situation, where time t 
enters the argument x through its logarithm and with an explicitly known prefactor 1 ///, 
and the usual situation of ballistic traveling waves, like e.g. in the FKPP equation [27]. 
For such traveling waves, time t is multiplied by an unknown velocity v, whose 
determination is non-trivial, and known to be very sensitive to discretization and other 
fluctuation effects [28]. 

The hull function F(x) of the traveling wave (A. 5) is found to obey the linear 
differential-difference equation 

F'(x) = /ie"^(F(x) - e^F(x - 1)). (A.6) 

As a consequence, its Laplace transform 



L F (s) 



/+oo 
e~ sx F{x) dx 
-oo 



(A.7) 
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obeys the difference equation 

sL F (s) = fi(l-e~ s )L F (s + fi), (A.8) 

whose normalised solution reads 

L F (s) = ^^r(-)p(s), (A.9) 

where P{s) is the infinite product 

1 _ p -S-fcjU 

fc>i - 1 e 
The latter product has zeros on the semi-infinite lattice of points s = —kfx + 2ml, with k 
and I integers such that k > 1, whereas L F (s) shares these zeros for I ^ only. 

For the time being, let us consider the characteristic function of the position n, i.e., 
the generating function of the probabilities p n (t): 

E(u,t) = (e""W) = Y,Pn(t)e un . (A.ll) 

n>0 

In the long-time regime, the traveling-wave form (A. 5) of the probabilities translates to 

oo oo 

E(u,t)K Yl F(n-\)e un = £ L F (27ri/ - w)e ( "- 2 " i/)A , (A.12) 

n=— oo i=— oo 

where the right-hand side has been obtained by means of the Poisson summation 
formula. 

Setting u = in (A.12), we obtain unity identically, as expected. The reason is 
that we have L F (0) = 1, whereas L F (27iil) = for I ^ 0; thus, the hull function F(x) 
has the remarkable property that the 'stroboscoped' sum equals one for all values of the 
real variable A: 

■oo 

£ F(n-\) = l. (A.13) 

n=— oo 

The asymptotic behaviour of the mean position (n(t)) can be derived by expanding 
the result (A.12) to first order in u. We thus obtain 

/i 2 /i^ V Z 1 / 

where C denotes Euler's constant. The sum is the Fourier series of a periodic function 
of A, with unit period. These oscillations originate in the discrete nature of the sites 
of the chain. They manifest themselves e.g. in the shape of the probability profile near 
its top. Oscillations are however extremely small for global quantities such as (n(t)). 
Their amplitude is essentially given by the first Fourier coefficients (/ = ±1), which are 
proportional to e _7r ^. For ji = 0.2, this amplitude is of order e~ 57r ~ 10~ 22 , while for 
/x = 1 it is of order e~ n ~ 10 -5 . 

Neglecting these (tiny) oscillations, we are left with 

<„(*))« ^±^ + 1-^(0). (A.15) 
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This expression confirms the estimates (A. 2) and (A. 4), and gives an explicit expression 
for the finite part of the logarithm, where 

^) = gi^ = /^ ? ^)C( 2 f. (A.16) 

The Mellin-Barnes integral representation of the right-hand side, where ((z) is 
Riemann's zeta function, is suited to the derivation of the expansion of P'(0) in the 
regime of small /i. We thus obtain the rapidly convergent expansion 

,. . C — In u 1 u a 3 / . v 

no) = —Ji + .-±.-^- + ..., (A.17) 

hence 

, . . . In ixt 1 ix u 3 / , n 

nt» — + - + — + - — + ••• (A.18) 

V y " \x 4 144 86400 V ; 

The full leading term was already correctly predicted in (A. 4). 

A similar treatment of the second moment (n(t) 2 ) demonstrates that the variance 
of the position saturates to the asymptotic value 

varn = lim ((n(tf) - (n(t)) 2 ) = -^ + 1 - K, (A.19) 

again up to negligibly small periodic oscillations, with 

K = PW - P"(0) = g j^ = / ^ r (z ) C (,)C(, - 1). (A.20) 

We thus obtain 

7T 2 1 1 

K = — - — + — + •••, (A.21) 

6/x 2 2/x 24 ' y J 

hence 

Varn = ^ + 2i + '" (A ' 22) 

In both above expressions the dots stand for an exponentially small contribution, 
proportional to e _47r ^. 

To close, let us come back to the form of the hull function F(x), which describes 
the asymptotic shape of the probability profile. The decay of the hull function at both 
ends is faster than exponential, since its Laplace transform Lp(s), given in (A. 9), is an 
entire function, i.e., it is analytic in the whole s plane. The decay of F(x) as x — > ±oo 
can be derived by inserting the asymptotic behaviour of Lp(s) as s — > =|=oo in the inverse 
Laplace formula 

F(x)= f^e s *L F {s). (A.23) 

J Am 

For s — > +oo, we have Lp{s) ~ T(s/fi)/fi. We thus obtain a double exponential decay: 
F(x) w exp(e-^) (x -»■ -oo). (A.24) 
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For s — > — oo, with exponential accuracy, we have Lp(s) ~ e~ s Lp (s + /x), so that 
Lp(s) ~ e s /( 2/i ). The hull function therefore falls off as a Gaussian: 

F(x) ~ e"^ 2/2 (x -► +00). (A.25) 

Finally, in the regime of small fi, where the variance varn ps 1/(2//) is large, the whole 
hull function is nearly Gaussian: 

F(s) « W^exp(-/i(a;-(ln/i)//i) 2 ). (A.26) 

Appendix B. Transient behaviour 

This Appendix is devoted to the transient behaviour of our models. Our main goal is 
to show that the transient responses of both models are qualitatively different. The 
polarisation of Model II exhibits a non-universal power-law decay, with a transient 
exponent G which depends continuously on (3 (see (B.13)), whereas the polarisation of 
Model I always falls off with the universal exponent 9 (see (5.11)). 

To probe this further, we analyse the transient average response of the synapse 
to a white- noise random input. We shift time so that t — is the beginning of the 
forgetting period. For simplicity and without loss of generality, we caricature transient 
effects by choosing an initial state such that the transient regime will last forever. More 
specifically, we assume that the synapse is prepared in a totally polarised state living 
entirely on the uppermost level: P n (0) = 0, Q n (0) = 5 n0 . We will successively consider 
the level occupation probabilities and the level-resolved and total polarisations. 

Level occupation probabilities 

We begin with the level occupation probabilities S n (t). The following scenario is 
expected in the long-time regime: the S n (t) should converge rather fast to their 
stationary values S^ (see (4.5)) at moderate level depths, whereas their values at very 
deep levels should fall off more rapidly, as these are unaffected by the random input. 

From a quantitative viewpoint, along the lines of (A. 5), we look for an asymptotic 
long-time solution to (4.2) or (4.4) for the S n (t) in the form of a traveling wave (front) 
moving on a logarithmic time scale: 

S»(t)»S?$(i), x = n-£ A \n 1 t. (B.l) 

The scaling function $(x) is expected to decrease from 1 in the x — > — 00 limit to in 
the x — > +00 limit. Both models have to be dealt with separately. 
• Model I: 

The function $(x) describing the front obeys the equation 

-27$'(s) = ^'^(ae-^^ix + 1) 

- (ae^ +/3 + 7)$(s) + 7e^ + ^ d $(x - 1)). (B.2) 
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Along the lines of Appendix A, we introduce the Laplace transform L$(s) of $(x), which 
obeys the functional equation 

-2 7 sL # (s) = i2 d (ae-^ + ^ +s - ae w - f3 - 7 + -fe^L^s + fi d ). (B.3) 

The expected behaviour of $(x) implies that L$(s) is analytic for s < and has a simple 
pole at s = with unit residue (i.e., lim s ^ (sL^(s)) = 1). The property that Lq>(s) has 
no pole at s = —fi d implies that the expression inside the parentheses on the right-hand 
side of (B.3) vanishes for s = —fid- We thus recover (4.9). The function L$(s) can be 
given as an explicit expression similar to (A. 9), involving two infinite products, which 
will not be needed in the following. 

• Model II: 

The analysis is very similar. The function $(a;) obeys the equation 

-27$'(s) = ^e'^jW^Or + 1) 

- (ae^ d +7)$(a;)+7e^ + ^<I)(a;- 1)). (B.4) 

The Laplace transform L$(s) obeys 

-2 7 sL$(s) = /i d (l - e"- s )(ae-" s+w+s - 7 )^$(s + fx d ). (B.5) 

The absence of a pole at s = —fi d implies 7 = ae _Ms . We thus recover (4.10). 

Level-resolved and total polarisations 

We now turn to the analysis of the level-resolved polarisations D n (t) and of the total 
polarisation D(t). We anticipate a power- law decay in the long-time regime. Thus, we 
look for an asymptotic solution to (4.2) or (4.4) for the D n (t) in the form of a power-law 
decay, with a positive exponent G, which multiplies a logarithmic front: 

D n (t)~r e f(i), x = n-£ d ]nyt. (B.6) 

The scaling function ty(x) is expected to decrease fast enough as x — > +00, in such a 
way that the total polarisation of the synapse also falls off as D(t) ~ £~ e . Both models 
again have to be dealt with separately. 

• Model I: 

The function ty(x) obeys the equation 

-27(*'(x) + e^(:r)) = fi d e-^ x [a^(x + 1) 

- (ae Md + /3 + 7)^(:r)+7e' td ^(:r-l)). (B.7) 

The Laplace transform L^(s) of ty(x) obeys the functional equation 

-2 7 (s + 9^)^(5) = /i d («c" d+s 

_ae / M-/3-7 + 7 e- s )L*(s + /i d ). ( B8 ) 

The fast decay of fy(x) as x — > +00 implies that L^,(s) is analytic at least for s < 0. The 
absence of a pole at s = — Q/j, d yields ae^~ e ^ d — ae Md — (3 — 7 + 7e 0Md = 0. Using (4.9), 
this condition simplifies to 

(e 6 ^ - e Ms+Md )(7e e/M - ae" Ms ) = 0. (B.9) 
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The vanishing of the first factor 4 " leads to the simple result that G is identical to the 
universal forgetting exponent 9 (see (5.11)). We have thus shown that Model I exhibits 
a remarkably universal power-law forgetting. 
• Model II: 

The analysis is similar, although it leads to a very different outcome. The function 
\&(x) obeys the equation 

-27(*'( x ) + &/2 d ty(x)) = nae-^fa^ix + 1) 

- (ae Md + 2(3 + *y)V(x) + 7e Md ^(x - 1)). (B.10) 

The Laplace transform Lq,(s) of ^>(x) obeys the functional equation 

-2 7 (s + 0/i d )L*(s) = n d (ae» d+s 

- a^ d -2/3-7 + 7e" s )L*(s + /i d ). (B.ll) 

The absence of a pole at s = — 0/x d yields ae^ 1-e ^ d — ae Md — 2/3 — 7 + 7e 0Md = 0. 
Using (4.10), this condition simplifies to 

7 ( e e Md _ e^+w)(i _ e - e ^) = 2/3. (B.12) 

In contrast to Model I, we now obtain a transient forgetting exponent 



+ - ( e^ s+ ^ d + 1 + J U^+m + 1 + M) 2 - 4e^+w 



6 = — In - + - fe^+^ + H- J(e^ + ^+H-M^„4eMs+M j ? (B.13) 

which depends continuously on /3. It turns out to be a strongly increasing function of /3, 
starting from the universal value 9 in the /3 — > limit as 

= 0+ , - 2/3 iX + ••• (B.14) 

(e^+^ _ l)/i d 7 v ; 

and reaching its maximum for /3 = /3 max (7) (see (4.16)). Figure Bl shows a plot of 
against /3 for the parameters (4.18). 

The occurrence of the non-trivial equation (B.12) for the exponent in the case of 
Model II can be traced back to the difference in architecture between both models. For 
Model I, where /3-transitions involve a non-local reinjection to the uppermost level, the 
rates multiplying S n and D n for generic n in the right-hand side of (4.2) are identical 
and involve the combination a n + /3 n + 7 n . For Model II, where /3-transitions take place 
locally at any depth, the rates multiplying S n and D n for generic n in the right-hand side 
of (4.4) are different, as they are respectively proportional to a n + 7„ and a n + 2/3 n + 7„, 
so that locally the polarisation D n relaxes more rapidly than the level occupation S n . 

Finally, the conclusions of this Appendix regarding the forgetting exponents hold 
more generally as soon as the level occupation probabilities in the initial state fall off 
exponentially more rapidly than the profile (4.5) of the default state. 

+ The second factor of (B.9) is positive, as a consequence of (4.9). 
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Figure Bl. Plot of the non- universal transient forgetting exponent O of Model II 
with the parameters (4.18), against (3 < /3 max (7). 
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